9 research outputs found
Ancient Roman coin recognition in the wild using deep learning based recognition of artistically depicted face profiles
As a particularly interesting application in the realm of cultural heritage on the one hand, and a technically challenging problem, computer vision based analysis of Roman Imperial coins has been attracting an increasing amount of research. In this paper we make several important contributions. Firstly, we address a key limitation of existing work which is largely characterized by the application of generic object recognition techniques and the lack of use of domain knowledge. In contrast, our work approaches coin recognition in much the same way as a human expert would: by identifying the emperor universally shown on the obverse.To this end we develop a deep convolutional network, carefully crafted for what is effectively a specific instance of profile face recognition. No less importantly, we also address a major methodological flaw of previous research which is, as we explain in detail, insufficiently systematic and rigorous,and mired with confounding factors. Lastly, we introduce three carefully collected and annotated data sets, and using these demonstrate the effectiveness of the proposed approach which is shown to exceed the performance of the state of the art by approximately an order of magnitude.Postprin
Block-Recurrent Transformers
We introduce the Block-Recurrent Transformer, which applies a transformer
layer in a recurrent fashion along a sequence, and has linear complexity with
respect to sequence length. Our recurrent cell operates on blocks of tokens
rather than single tokens during training, and leverages parallel computation
within a block in order to make efficient use of accelerator hardware. The cell
itself is strikingly simple. It is merely a transformer layer: it uses
self-attention and cross-attention to efficiently compute a recurrent function
over a large set of state vectors and tokens. Our design was inspired in part
by LSTM cells, and it uses LSTM-style gates, but it scales the typical LSTM
cell up by several orders of magnitude. Our implementation of recurrence has
the same cost in both computation time and parameter count as a conventional
transformer layer, but offers dramatically improved perplexity in language
modeling tasks over very long sequences. Our model out-performs a long-range
Transformer XL baseline by a wide margin, while running twice as fast. We
demonstrate its effectiveness on PG19 (books), arXiv papers, and GitHub source
code. Our code has been released as open source.Comment: Update to NeurIPS camera-ready versio
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute
The Languini Kitchen serves as both a research collective and codebase
designed to empower researchers with limited computational resources to
contribute meaningfully to the field of language modelling. We introduce an
experimental protocol that enables model comparisons based on equivalent
compute, measured in accelerator hours. The number of tokens on which a model
is trained is defined by the model's throughput and the chosen compute class.
Notably, this approach avoids constraints on critical hyperparameters which
affect total parameters or floating-point operations. For evaluation, we
pre-process an existing large, diverse, and high-quality dataset of books that
surpasses existing academic benchmarks in quality, diversity, and document
length. On it, we compare methods based on their empirical scaling trends which
are estimated through experiments at various levels of compute. This work also
provides two baseline models: a feed-forward model derived from the GPT-2
architecture and a recurrent model in the form of a novel LSTM with ten-fold
throughput. While the GPT baseline achieves better perplexity throughout all
our levels of compute, our LSTM baseline exhibits a predictable and more
favourable scaling law. This is due to the improved throughput and the need for
fewer training tokens to achieve the same decrease in test perplexity.
Extrapolating the scaling laws leads of both models results in an intersection
at roughly 50,000 accelerator hours. We hope this work can serve as the
foundation for meaningful and reproducible language modelling research
A Modern Self-Referential Weight Matrix That Learns to Modify Itself
The weight matrix (WM) of a neural network (NN) is its program. The programs
of many traditional NNs are learned through gradient descent in some error
function, then remain fixed. The WM of a self-referential NN, however, can keep
rapidly modifying all of itself during runtime. In principle, such NNs can
meta-learn to learn, and meta-meta-learn to meta-learn to learn, and so on, in
the sense of recursive self-improvement. While NN architectures potentially
capable of implementing such behaviour have been proposed since the '90s, there
have been few if any practical studies. Here we revisit such NNs, building upon
recent successes of fast weight programmers and closely related linear
Transformers. We propose a scalable self-referential WM (SRWM) that learns to
use outer products and the delta update rule to modify itself. We evaluate our
SRWM in supervised few-shot learning and in multi-task reinforcement learning
with procedurally generated game environments. Our experiments demonstrate both
practical applicability and competitive performance of the proposed SRWM. Our
code is public.Comment: Accepted to ICML 202
Solving Quantitative Reasoning Problems with Language Models
Language models have achieved remarkable performance on a wide range of tasks
that require natural language understanding. Nevertheless, state-of-the-art
models have generally struggled with tasks that require quantitative reasoning,
such as solving mathematics, science, and engineering problems at the college
level. To help close this gap, we introduce Minerva, a large language model
pretrained on general natural language data and further trained on technical
content. The model achieves state-of-the-art performance on technical
benchmarks without the use of external tools. We also evaluate our model on
over two hundred undergraduate-level problems in physics, biology, chemistry,
economics, and other sciences that require quantitative reasoning, and find
that the model can correctly answer nearly a third of them.Comment: 12 pages, 5 figures + references and appendice